Reducing Run Queue Contention in Shared Memory Multiprocessors
نویسنده
چکیده
Feature No single method for mitigating the performance problems of centralized and distributed run queues is entirely successful. A hierarchical run queue succeeds by borrowing the best features of both. P erformance of parallel processing systems, especially large systems, is sensitive to various types of overhead and contention. Performance consequences may be serious when contention occurs for hardware resources such as memory or the interconnection network. Contention can also occur for software resources such as critical data structures maintained by either system or application software. A run queue is one such critical data structure that can affect overall system performance. There are two basic types of run queues, centralized and distributed. Both present performance problems. There are also several techniques to mitigate their drawbacks , but none is completely satisfactory. Instead, I propose a different run queue organization, a hierarchical organization that inherits the best features of the centralized and the distributed queue organizations while avoiding their pitfalls. Thus, the hierarchical organization is suitable for building large-scale multiprocessor systems. Shared memory multiprocessors give programmers a single address space much like in a traditional uniprocessor system. Processors communicate through shared-memory variables. Often simply called multiprocessors, the term I use here, shared memory multiprocessors are evolving toward general purpose multiuser systems. 1 Multiprocessors provide either uniform memory access or nonuniform memory access. In UMA mul-tiprocessors, the cost of accessing a memory location is the same for any processor in the system. In NUMA multiprocessors, memory access cost varies. Generally, a UMA architecture is good for systems with tens of processors, while a NUMA architecture lets us build systems with hundreds of processors. Figure 1 illustrates a UMA multiprocessor, in which the shared memory is global to all processors. An interconnection network facilitates communication between the processors and the global shared memory. Typically, UMA multiprocessors use a single bus as the interconnection network. The Sequent Symmetry and Encore Multimax are examples of commercial bus-based UMA systems. Using a common, limited-bandwidth bus as an interconnection network severely restricts system scalability. Furthermore, this type of interconnec-tion network allows only one processor at a time to communicate with the memory, leading to performance degradation. To avoid this problem, the shared memory is divided into memory modules, as Figure 1 shows. Typically, UMA multiprocessors allow concurrent access to all memory modules, as long as there are processors requesting access to memory modules and no two processors wish …
منابع مشابه
Reducing Contention for Run Queue in Shared-Memory Multiprocessor Systems
Performance of parallel processing systems is sensitive to various hardware and software overheads and contention for hardware and software resources. Hardware resources such as interconnection network and memory introduce communication contention and memory contention that could seriously impact overall system performance. Software resources include critical data structures maintained by appli...
متن کاملA Skiplist-Based Concurrent Priority Queue with Minimal Memory Contention
Priority queues are fundamental to many multiprocessor applications. Several priority queue algorithms based on skiplists have been proposed, as skiplists allow concurrent accesses to different parts of the data structure in a simple way. However, for priority queues on multiprocessors, an inherent bottleneck is the operation that deletes the minimal element. We present a linearizable, lock-fre...
متن کاملThe Impact of Exploiting Instruction-Level Parallelism on Shared-Memory Multiprocessors
ÐCurrent microprocessors incorporate techniques to aggressively exploit instruction-level parallelism (ILP). This paper evaluates the impact of such processors on the performance of shared-memory multiprocessors, both without and with the latencyhiding optimization of software prefetching. Our results show that, while ILP techniques substantially reduce CPU time in multiprocessors, they are les...
متن کاملA Study of the Effect of Prefetching in Shared-Memory Resource Contention
Managing contention for shared resources in chip multiprocessors has become very challenging as the number of cores and execution contexts scale up. Contention for the memory hierarchy resources, especially the shared caches, can severely degrade an application’s performance and system throughput [2]. One of the important resources related to caching is hardware prefetcher whose effect on the s...
متن کاملEager Combining: a Coherency Protocol for Increasing Eeective Network and Memory Bandwidth in Shared-memory Multiprocessors
One common cause of poor performance in large-scale shared-memory multiprocessors is limited memory or interconnection network bandwidth. Even well-designed machines can exhibit band-width limitations when a program issues an excessive number of remote memory accesses or when remote accesses are distributed non-uniformly. While techniques for improving locality of reference are often successful...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEEE Computer
دوره 30 شماره
صفحات -
تاریخ انتشار 1997